

PATENT  
02-CT-099/DP

## IN THE UNITED STATES PATENT AND TRADEMARK OFFICE

In re Application of: : Group Art Unit: 2183  
Francesco PAPPALARDO et al. : Confirmation No.: 4773  
Serial No.: 10/622,835 :  
Filed: July 18, 2003 :  
For: PIPELINE STRUCTURE :  
:

**CLAIM FOR PRIORITY UNDER 35 U.S.C. §119**

Commissioner for Patents  
P.O. Box 1450  
Alexandria, VA 22313-1450

Sir:

Under the provisions of 35 U.S.C. §119, there is filed herewith a certified copy of European Application No. 02-425469, filed July 19, 2002, in accordance with the International Convention for the Protection of Industrial Property, 53 Stat. 1748, under which Applicants hereby claim priority.

Respectfully submitted,

Date: 12/23/03

By: 

Stephen Bongini  
Reg. No. 40,917

FLEIT, KAIN, GIBBONS,  
GUTMAN, BONGINI & BIANCO P.L.  
551 NW 77th Street, Suite 111  
Boca Raton, Florida 33487  
Telephone: (561) 989-9811  
Facsimile: (561) 989-9812





Europäisches  
Patentamt

European  
Patent Office

Office européen  
des brevets

Bescheinigung

Certificate

Attestation

Die angehefteten Unterlagen stimmen mit der ursprünglich eingereichten Fassung der auf dem nächsten Blatt bezeichneten europäischen Patentanmeldung überein.

The attached documents are exact copies of the European patent application described on the following page, as originally filed.

Les documents fixés à cette attestation sont conformes à la version initialement déposée de la demande de brevet européen spécifiée à la page suivante.

Patentanmeldung Nr. Patent application No. Demande de brevet n°

02425469.0

Der Präsident des Europäischen Patentamts;  
Im Auftrag

For the President of the European Patent Office

Le Président de l'Office européen des brevets  
p.o.

R C van Dijk





Anmeldung Nr:  
Application no.: 02425469.0  
Demande no:

Anmeldetag:  
Date of filing: 19.07.02  
Date de dépôt:

Anmelder/Applicant(s)/Demandeur(s):

STMicroelectronics S.r.l.  
Via C. Olivetti, 2  
20041 Agrate Brianza (Milano)  
ITALIE

Bezeichnung der Erfindung/Title of the invention/Titre de l'invention:  
(Falls die Bezeichnung der Erfindung nicht angegeben ist, siehe Beschreibung.  
If no title is shown please refer to the description.  
Si aucun titre n'est indiqué se referer à la description.)

A pipeline structure

In Anspruch genommene Priorität(en) / Priority(ies) claimed /Priorité(s)  
revendiquée(s)  
Staat/Tag/Aktenzeichen/State/Date/File no./Pays/Date/Numéro de dépôt:

Internationale Patentklassifikation/International Patent Classification/  
Classification internationale des brevets:

G06F15/76

Am Anmeldetag benannte Vertragstaaten/Contracting states designated at date of  
filing/Etats contractants désignées lors du dépôt:

AT BE BG CH CY CZ DE DK EE ES FI FR GB GR IE IT LI LU MC NL PT SE SK TR



The present invention relates to a pipeline structure for use in a digital system.

A pipeline structure consists of a sequence of functional units (stages), which perform a task in several 5 steps; the stages work in parallel thus giving higher throughput than if all the steps had to be completed before starting a next task. Pipelines are commonly used in several applications, for example, to process different parts of an instruction in a microprocessor.

10 Typically, the pipeline has a synchronous architecture. A synchronous pipeline receives a single clock signal, which controls all the stages. As a consequence, every stage must complete its work within one clock period.

A drawback of the synchronous pipeline is that all the 15 stages switch at the same time. This involves high peaks of power consumption (due to the current absorbed by the short-circuits that are formed during the switching of the transistors of the logic gates, and to the current needed for charging and discharging wires and driven capacitors). 20 These peaks of power consumption introduce sources of noise, which can jeopardise the functionality of a whole electronic device embedding the pipeline. Moreover, they impose several constraints in the design of a power supply structure; particularly, metal tracks used to supply the electronic 25 device (when integrated in a chip of semiconductor material) must be dimensioned so as to withstand the aforementioned

high peaks; as a consequence, an increased area of the chip is required to integrate the electronic device.

Asynchronous pipelines have also been proposed, in order to reduce the peaks of power consumption. In an 5 asynchronous pipeline, all the stages proceed independently (so that they do not switch at the same time). A handshaking mechanism is then used to maintain every pair of adjacent stages in synchronisation. For this purpose, each stage generates a signal indicative of the completion of its work. 10 This signal is used to move the result of the stage to a next stage, and then to trigger starting of the next stage.

However, the implementation of the handshaking mechanism is relatively complex. Moreover, an additional circuit is required to synchronise the flux of input and 15 output information with the outside.

It is an object of the present invention to overcome the above-mentioned drawbacks. In order to achieve this object, a structure as set out in the first claim is proposed.

20 Briefly, the present invention provides a pipeline structure for use in a digital system including a plurality of stages arranged in a sequence from a first stage for receiving an input of the pipeline structure to a last stage for providing an output of the pipeline structure, at least 25 one intermediate stage being interposed between the first stage and the last stage, wherein the first stage and the

last stage are controlled by a main clock signal; the pipeline structure further includes phase shifting means for generating at least one local clock signal from the main clock signal for controlling the at least one intermediate 5 stage, the main clock signal and the at least one local clock signal being out of phase.

Moreover, the present invention provides a digital system including this pipeline structure, and an electronic device including the digital system; a corresponding method 10 of operating a pipeline structure is also encompassed.

Further features and the advantages of the solution according to the present invention will be made clear by the following description of a preferred embodiment thereof, given purely by way of a non-restrictive indication, with 15 reference to the attached figures, in which:

Figure 1 is a schematic block diagram of a hand-held computer wherein the pipeline structure of the invention can be used;

Figure 2 illustrates the functional blocks of the 20 pipeline; and

Figure 3 is a time diagram showing operation of the pipeline structure.

With reference in particular to Figure 1, a hand-held computer 100 is depicted. The hand-held computer 100, also 25 known as palmtop, pocket computer or Personal Digital

Assistants (PDA), consists of a very small system that literally fits in one hand. The hand-held computer 100 is formed by several units, which are connected in parallel to a communication bus 105. In detail, a microprocessor 110 5 controls operation of the hand-held computer 100, a DRAM 115 is directly used as a working memory by the microprocessor 110, and a Read Only Memory (ROM) 120 stores basic code for a bootstrap of the hand-held computer 100.

Several peripheral units are further connected to the 10 bus 105. Particularly, a non-volatile memory 125, typically consisting of a flash E<sup>2</sup>PROM, operates as a solid-state mass memory for the hand-held computer 100. Moreover, the hand-held computer 100 includes input devices 130 (for example, an electronic pen or stylus), and output devices 135 (for 15 example, a flat panel screen made with a TFT technology). Interfaces 140 are used to connect external peripherals (such as a PCMCIA network card) to the hand-held computer 100.

A timing unit 145 generates a main clock signal CLK<sub>m</sub>, 20 which is used to synchronise operation of the hand-held computer 100. A battery pack 150 provides a power supply voltage Vdd for all the units of the hand-held computer 100, so as to enable the hand-held computer 100 to run without plugging it in.

25 The microprocessor 110 has a pipeline architecture, wherein a sequence of stages simultaneously processes

different parts of every instruction to be executed by the microprocessor 110. Particularly, a first stage fetches the instruction (from the DRAM 115), a second stage decodes the instruction, a third stage fetches the respective arguments 5 (if any), a fourth stage executes the operations required by the instruction, and a fifth stage stores a possible result. In this way, as one instruction is executed, the next instruction is being decoded and the one after that is being fetched. For maximum performance, the pipeline requires a 10 continuous stream of instructions; therefore, this technique is commonly combined with instruction prefetch in an attempt to keep the pipeline busy.

Similar considerations apply if the hand-held computer has a different structure or includes other units (for 15 example, an infrared port), if the pipeline is formed by a different number of stages, if no prefetch is implemented, if each stage performs other functions, and the like. Alternatively, the pipeline is used in the microprocessor of a laptop computer, in a mobile telephone, in a memory 20 (wherein data is saved in a stack while next data is being accessed), or more generally in any other digital system.

Considering now Figure 2, a structure 200 of the pipeline used in the microprocessor of the hand-held computer is depicted. The pipeline 200 is formed by  $N=5$  25 stages  $ST_i$  (with  $i=1\dots N$ ). Each stage  $ST_i$  includes a register  $R_i$  and a combinatorial circuit  $C_i$  (save for the

last stage  $ST_5$ , which only has the register  $R_5$  without any combinatorial circuit). The combinatorial circuit  $C_i$  is cascade connected to the respective register  $R_i$ ; the register  $R_1$  (of the first stage  $ST_1$ ) and the register  $R_5$  (of 5 the last stage  $ST_5$ ) define an input and an output, respectively, of the pipeline 200.

An input word  $IN$  (for example, of 32 bits) received by the pipeline 200 is stored into the register  $R_1$  (as a word  $IN_1$ ). Each register  $R_i$  (with the exception of the last one) 10 operates as an input buffer for the respective combinatorial circuit  $C_i$ . The combinatorial circuit  $C_i$  processes a word  $IN_i$  provided by the register  $R_i$ , and generates a result consisting of a word  $OUT_i$ ; the combinatorial circuit  $C_i$  has a propagation time  $P_i$  (defined as the delay for obtaining 15 the word  $OUT_i$  from the word  $IN_i$ ). The output of the combinatorial circuit  $C_i$  is then stored into the next register  $R_{i+1}$  (so that  $IN_{i+1}=OUT_i$ ). The word stored in the last register  $R_5$  ( $OUT_4$ ) is sent to the outside as an output word  $OUT$  of the pipeline 200.

20 Operation of the pipeline 200 is controlled by the main clock signal  $CLK_m$ . Particularly, each register  $R_i$  has a control terminal, which is used to trigger loading of the word supplied to its input (word  $IN$  for the register  $R_1$  and word  $IN_i$  for the other registers  $R_2-R_5$ ). The first register 25  $R_1$  and the last register  $R_5$  are controlled by the main clock signal  $CLK_m$  directly. The other registers  $R_2-R_4$  (of the

intermediate stages  $ST_2-ST_4$ ) are controlled by local clock signals  $CLK_2-CLK_4$ , respectively. The local clock signals  $CLK_2-CLK_4$  are generated from the main clock signal  $CLK_m$  using a phase shifting circuit. This circuit consists of a delay block  $D_i$  for each intermediate stage  $ST_i$ . The block  $D_i$  generates the corresponding local clock signal  $CLK_i$  applying a pre-set delay  $d_i$  to the clock signal controlling the next stage  $ST_{i+1}$ ; in other words, the local clock signals  $CLK_2$ ,  $CLK_3$  and  $CLK_4$  are generated delaying the clock signals  $CLK_3$ ,  $10$   $CLK_4$  and  $CLK_m$ , respectively. The delay blocks  $D_2-D_4$  ensure that the main clock signal  $CLK_m$  and every local clock signal  $CLK_i$  are out of phase, so that the registers  $R_1-R_5$  never switch at the same time.

Similar considerations apply if the pipeline includes a  $15$  different number of stages (down to three), if the word consists of a different number of bits, if the registers are replaced with equivalent buffers, if a further combinatorial circuit is connected to the last register, if the first register is missing, and the like.

$20$  Operation of the pipeline described above is shown in the simplified time diagram of Figure 3. The various signals are switched at the rising edge of the respective clock signal ( $CLK_m$ ,  $CLK_2-CLK_4$ ); each word is represented by a band (the crossing points of the band define the switching times). The input word  $IN$  is loaded into the first register  $25$   $R_1$  (word  $IN_1$ ) at the time  $T_1$  (in response to the raising edge

of the main clock signal  $CLK_m$ ). The word  $IN_1$  is processed by the combinatorial circuit  $C_1$ ; the output of the combinatorial circuit  $C_1$  (word  $OUT_1$ ) is stored into the second register  $R_2$  (word  $IN_2$ ) at the next rising edge of the local clock signal  $CLK_2$  (time  $T_1+d_4+d_3+d_2$ ). In a similar manner, the output of the combinatorial circuit  $C_2$  (word  $OUT_2$ ) is stored into the third register  $R_3$  (word  $IN_3$ ) at the next rising edge of the local clock signal  $CLK_3$  (time  $T_2+d_4+d_3$ ). The output of the combinatorial circuit  $C_3$  (word  $OUT_3$ ) is likewise stored into the fourth register  $R_4$  (word  $IN_4$ ) at the next rising edge of the local clock signal  $CLK_4$  (time  $T_3+d_4$ ). The word  $IN_4$  is then processed by the combinatorial circuit  $C_4$ ; the output of the combinatorial circuit  $C_4$  (word  $OUT_4$ ) is stored into the last register  $R_5$  (providing the output word  $OUT$ ) at the next rising edge of the main clock signal  $CLK_m$  (time  $T_4$ ). Therefore, three clock periods ( $T_4-T_1$ ) are needed to pass through the entire pipeline (in order to get the output word  $OUT$  corresponding to the input word  $IN$ ).

Correct operation of the pipeline requires that a new word cannot be written into a register before the previous one has been used (by the next combinatorial circuit). Particularly, a generic word  $IN_i$  is supplied to the combinatorial circuit  $C_i$  as soon as it is loaded into the corresponding register  $R_i$ . The combinatorial circuit  $C_i$  generates the resulting word  $OUT_i$  after the respective

propagation time  $P_i$ . In order to ensure that the combinatorial circuit  $C_i$  has completed its work before the word  $OUT_i$  is stored in the next register  $R_{i+1}$ , the difference between the switching times of the registers  $R_{i+1}$  and  $R_i$  must 5 be greater than the propagation time  $P_i$  of the combinatorial circuit  $C_i$ .

Considering in particular the first stage  $ST_1$ , the register  $R_1$  switches at every rising edge of the main clock signal  $CLK_m$  (for example,  $T_1$ ); the second register  $R_2$  10 switches at the time  $T_1 + d_4 + d_3 + d_2 = T_1 + \sum_{j=2}^{N-1} d_j$ . Therefore, the following relation must be met:

$$T_1 + \sum_{j=2}^{N-1} d_j - T_1 \geq P_1$$

$$\sum_{j=2}^{N-1} d_j \geq P_1$$

15 Denoting with  $T_m$  the time of a generic raising edge of the main clock signal  $CLK_m$ , a register  $R_i$  of any intermediate stage (from  $ST_2$  to  $ST_4$ ) switches at the time  $T_m + \sum_{j=i}^{N-1} d_j$ ; the next register  $R_{i+1}$  switches at the time  $T_{m+1} + \sum_{j=i+1}^{N-1} d_j = T_m + T + \sum_{j=i+1}^{N-1} d_j$  (where  $T$  is the period of the main clock signal  $CLK_m$ ). 20 Therefore, the restraint applicable to every intermediate stage is:

$$T_m + T + \sum_{j=i+1}^{N-1} d_j - (T_m + \sum_{j=i}^{N-1} d_j) \geq P_i$$

$$T - d_i \geq P_i$$

Finally, the register  $R_4$  switches at the time  $T_3 + d_4$  and the 25 register  $R_5$  switches at the time  $T_4 = T_3 + T$ , so that the following condition must be met for the last stage:

$$T_3 + T - (T_3 + d_4) \geq P_4$$

$$T - d_4 \geq P_4$$

Similar considerations apply if a different timing is envisaged for the pipeline, if the signals are strobed after 5 two or more clock periods from their switching, if the difference between the switching times of the adjacent registers is greater than the clock period, and the like.

More generally, the present invention proposes a pipeline structure for use in a digital system. The pipeline 10 structure includes a plurality of stages arranged in a sequence from a first stage (for receiving an input of the pipeline structure) to a last stage (for providing an output of the pipeline structure); one or more intermediate stages are interposed between the first stage and the last stage. 15 The first stage and the last stage are controlled by a main clock signal. In the pipeline structure of the invention, phase shifting means are provided for generating one or more local clock signals (from the main clock signal) for controlling the intermediate stages; the main clock signal 20 and the local clock signals are out of phase.

The proposed solution strongly reduces the peaks of power consumption in the pipeline structure. In this way, less sources of noise are introduced. Moreover, the constraints in the design of a power supply structure of a 25 whole electronic device embedding the pipeline are relaxed; particularly, metal tracks used to supply the electronic

device (when integrated in a chip of semiconductor material) may be smaller; as a consequence, a reduced area of the chip is required to integrate the electronic device.

This result is achieved with a very simple  
5 architecture, without any handshaking mechanism among the stages of the pipeline.

In addition, the pipeline of the invention maintains a synchronous interface with the outside (for the flux of input and output information). Particularly, the proposed  
10 solution makes it possible to reduce the number of clock periods required to pass through the entire pipeline (compared with the synchronous pipeline known in the art), even if different timings are not excluded.

The preferred embodiment of the invention described  
15 above offers further advantages.

For example, the pipeline has multiple intermediate stages, each one controlled by a corresponding local clock signal (with all the local clock signals that are out of phase).

20 This feature further reduces the peaks of power consumption (since all the intermediate stages switch at different times).

Preferably, each local clock signal is obtained delaying the clock signal controlling an adjacent stage.

25 The proposed structure is very simple, but at the same time effective.

As a further enhancement, each delay block is input the clock signal of a next stage.

This solution makes it possible to ensure correct operation of the pipeline with shorter delays (than if the 5 local clock signals were obtained from the previous stage).

Alternatively, the local clock signals are not all out of phase, two or more stages are controlled by the same local clock signal, the pipeline includes a single intermediate stage, each local clock signal is obtained 10 delaying another clock signal (for example, the one controlling the previous stage), or different phase shifting means are envisaged.

Particularly, each intermediate stage includes a functional unit and a buffer; the functional unit has a 15 propagation time lower than the phase difference between the corresponding clock signal and the clock signal controlling the next stage.

This structure better exploits the advantageous effects of the present invention (at the same time ensuring correct 20 operation of the pipeline).

Preferably, each stage consists of a combinatorial circuit and a corresponding buffer (storing a word).

In this way, the peaks of power consumption are reduced to the minimum.

25 However, the solution according to the invention leads itself to be implemented in a pipeline wherein each register

consists of a stack with a depth of two or more words, or even in a pipeline having a different architecture (for example, consisting of a simple shift register without any combinatorial circuit).

5       Typically, the pipeline of the invention is used in a digital system.

The improvement provided by the synchronous interface of the proposed pipeline is clearly perceived in a digital system of the synchronous type.

10      Moreover, the solution according to the present invention is particularly advantageous in an electronic device that is supplied by a battery (wherein the power consumption is a very critical issue).

15      However, the pipeline of the invention is also suitable to be used in a different digital system (even of the asynchronous type), and in any other electronic device (for example, supplied by mains electricity).

20      Naturally, in order to satisfy local and specific requirements, a person skilled in the art may apply to the solution described above many modifications and alterations all of which, however, are included within the scope of protection of the invention as defined by the following claims.



phase shifting means includes a delay block (D<sub>2</sub>, D<sub>3</sub>, D<sub>4</sub>) for obtaining the corresponding local clock signal (CLK<sub>2</sub>, CLK<sub>3</sub>, CLK<sub>4</sub>) from the clock signal (CLK<sub>3</sub>, CLK<sub>4</sub>, CLK<sub>m</sub>) controlling an adjacent stage (ST<sub>3</sub>, ST<sub>4</sub>, ST<sub>5</sub>) in the sequence.

5

4. The pipeline structure (200) according to claim 3, wherein each delay block (D<sub>2</sub>, D<sub>3</sub>, D<sub>4</sub>) is connected to obtain the local clock signal (CLK<sub>2</sub>, CLK<sub>3</sub>, CLK<sub>4</sub>) controlling the corresponding intermediate stage (ST<sub>2</sub>, ST<sub>3</sub>, ST<sub>4</sub>) from the clock signal (CLK<sub>3</sub>, CLK<sub>4</sub>, CLK<sub>m</sub>) controlling a next stage (ST<sub>3</sub>, ST<sub>4</sub>, ST<sub>5</sub>) in the sequence.

5. The pipeline structure (200) according to claim 4, wherein each intermediate stage (ST<sub>1</sub>-ST<sub>4</sub>) includes a functional unit (C<sub>1</sub>-C<sub>4</sub>) cascade connected to a buffer (R<sub>1</sub>-R<sub>4</sub>), the buffer storing an output of the functional unit of a previous stage in the sequence responsive to the corresponding clock signal (CLK<sub>m</sub>, CLK<sub>2</sub>-CLK<sub>4</sub>) and the functional unit having a propagation time lower than the phase difference between the corresponding clock signal and the clock signal (CLK<sub>2</sub>-CLK<sub>4</sub>, CLK<sub>m</sub>) controlling the next stage (ST<sub>2</sub>-ST<sub>5</sub>).

6. The pipeline structure (200) according to claim 5, wherein each functional unit consists of a combinatorial

CLAIMS

1. A pipeline structure (200) for use in a digital system (110) including a plurality of stages (ST<sub>i</sub>) arranged in a sequence from a first stage (ST<sub>1</sub>) for receiving an input of the pipeline structure to a last stage (ST<sub>5</sub>) for providing an output of the pipeline structure, at least one intermediate stage (ST<sub>2</sub>-ST<sub>4</sub>) being interposed between the first stage and the last stage, wherein the first stage and 10 the last stage are controlled by a main clock signal (CLK<sub>m</sub>), characterized in that

the pipeline structure further includes phase shifting means (D<sub>2</sub>-D<sub>4</sub>) for generating at least one local clock signal (CLK<sub>2</sub>-CLK<sub>4</sub>) from the main clock signal for controlling the at 15 least one intermediate stage, the main clock signal and the at least one local clock signal being out of phase.

2. The pipeline structure (200) according to claim 1, wherein the at least one intermediate stage consists of a 20 plurality of intermediate stages (ST<sub>2</sub>-ST<sub>4</sub>) each one controlled by a corresponding local clock signal (CLK<sub>2</sub>-CLK<sub>4</sub>), the local clock signals being out of phase.

3. The pipeline structure (200) according to claim 1 or 25 2, wherein for each intermediate stage (ST<sub>2</sub>, ST<sub>3</sub>, ST<sub>4</sub>) the

circuit ( $C_1-C_4$ ) and each buffer consists of a register ( $R_1-R_4$ ) for storing a word.

7. A digital system (110) including the pipeline  
5 structure (200) according to any claim from 1 to 6.

8. The digital system (110) according to claim 7,  
wherein the digital system is of the synchronous type.

10 9. An electronic device (100) including the digital  
system (110) according to claim 7 or 8, and a battery (150)  
for supplying the digital system (110).

10. A method of operating a pipeline structure (200)  
15 for use in a digital system (110) including a plurality of  
stages ( $ST_i$ ) arranged in a sequence from a first stage ( $ST_1$ )  
for receiving an input of the pipeline structure to a last  
stage ( $ST_5$ ) for providing an output of the pipeline  
structure, at least one intermediate stage ( $ST_2-ST_4$ ) being  
20 interposed between the first stage and the last stage,  
wherein the method includes the steps of:

controlling the first stage and the last stage by means  
of a main clock signal ( $CLK_m$ ),

25 characterized by the steps of  
generating at least one local clock signal ( $CLK_2-CLK_4$ )

from the main clock signal, the main clock signal and the at least one local clock signal being out of phase, and

controlling the at least one intermediate stage by means of the at least one local clock signal.

ABSTRACT

## A PIPELINE STRUCTURE

5       A pipeline structure (200) for use in a digital system (110) is proposed. The pipeline structure includes a plurality of stages (ST<sub>i</sub>) arranged in a sequence from a first stage (ST<sub>1</sub>) for receiving an input of the pipeline structure to a last stage (ST<sub>5</sub>) for providing an output of 10 the pipeline structure, at least one intermediate stage (ST<sub>2</sub>-ST<sub>4</sub>) being interposed between the first stage and the last stage, wherein the first stage and the last stage are controlled by a main clock signal (CLK<sub>m</sub>); the pipeline structure further includes phase shifting means (D<sub>2</sub>-D<sub>4</sub>) for 15 generating at least one local clock signal (CLK<sub>2</sub>-CLK<sub>4</sub>) from the main clock signal for controlling the at least one intermediate stage, the main clock signal and the at least one local clock signal being out of phase.





**FIG.1**

2/3



FIG.2



FIG.3

